Final Project

Brad Harbans

Overview

I will be using the Aquastat databases which is provided by the Food and Agriculture Organization of the United Nations. The database provides information about water and land resources for several countries. These can be grouped by region or income levels.

Data Source

I will be using the Aquastat databases which is provided by the Food and Agriculture Organization of the United Nations. The entire dataset takes approximatley 1 TB in disk space. As such, I used the interactive tool on the FAO website to download a subset of the overall repository. In particular, I selected population data; the prevelance and population of people undernourished, to be used as a proxy for poverty; and data about access to sage drinking water. The data has been restricted to the last 4 years.

I will begin by reading the data into pandas:

I will also use data from the WorldBank through the Wbdata interface, to obtain information about each country, such as income level and region information. There is a large amount of data availble from this dataset using this interface.

I will now show the data from FAO for Andorra as an example.

I will now convert the data to wide format so that each row represents a country.

I performed an inner join on the data between the two sources. Please note, due to differences in naming the join has resulted in data from some contries being dropped.

As can be seen in the parallell catagories plot, lower income and lower middle income countries have lesser access to safe- drinking water. Sub-Sahara Africa is has the least access to safe drinking water.

Unfortunatley due to the high amount of missing data, it is hard to see the connection between undernourishment and income level of a country. Although, the brighter yellow corresponds to a higher prevalance of undernourshment. This color is mostly present in Sub-Saharan Africa.

A bubble chart shows the prevalance of undernourshiment against the % of population with access to safe water.